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Abstract 

A  specialized  formulation  of  Azarbayejani  and  Pentland's 
framework  for  recursive  recovery  of  motion,  structure  and 
focal  length  from  feature  correspondences  tracked  through 
an  image  sequence  is  presented.  The  specialized  formula¬ 
tion  addresses  the  case  where  all  tracked  points  lie  on  a 
plane.  This  planarity  constraint  reduces  the  dimension  of 
the  original  state  vector,  and  consequently  the  number  of 
feature  points  needed  to  estimate  the  state.  Experiments 
with  synthetic  data  and  real  imagery  illustrate  the  system 
performance.  The  experiments  confirm  that  the  specialized 
formulation  provides  improved  accuracy,  stability  to  ob- 
ser\’ation  noise,  and  rate  of  convergence  in  estimation  for 
the  case  where  the  tracked  points  lie  on  a  plane. 

1  Introduction 

Inferring  3D  structure  and  motion  from  2D  image  se¬ 
quences  has  been  a  central  problem  in  computer  vision  for 
many  years.  Many  early  studies  focused  on  methods  of  re¬ 
lating  pixel  coordinates  to  3D  coordinates  via  camera  cali¬ 
bration  [22],  that  is  computing  the  projection  matrix  which 
relates  image  coordinates  to  a  world  coordinate  frame.  In 
recent  years,  the  focus  has  shifted  to  non-metric  recon¬ 
struction  from  uncalibrated  cameras  [9],  by  computing  the 
fundamental  matrix  (two  views)  [12],  and  the  trilinear  ten¬ 
sor  (three  views)  [16].  Also,  different  camera  models  were 
assumed;  i.  e. ,  orthographic  [20,  23],  perspective  projection 
[11,  25],  or  a  unified  model  [1,  15]. 

Structure  and  motion  algorithms  typically  assume  given 
correspondences  between  features  in  successive  frames. 
Finding  such  correspondences  in  a  reliable  way  is  a  prob¬ 
lem  that  still  occupies  researchers  in  the  field.  The  most 
common  approaches  used  to  solve  this  problem  are  meth¬ 
ods  based  on  optical  flow  [8,  13]  and  methods  based  on 
feature  matching  [21,  26].  In  flow-based  methods,  a  veloc¬ 
ity  vector  is  computed  for  each  pixel  in  the  region  of  inter¬ 
est  using  variational  techniques.  In  feature-based  methods, 
image  features  such  as  points  and  lines  are  extracted  in  the 
first  frame  and  are  then  matched  to  corresponding  features 
in  successive  frames  using  correlation  and  relaxation  tech¬ 
niques.  In  both  types  of  methods  it  is  typically  assumed 
that  the  intensity  of  an  image  point,  given  a  small  motion 
between  successive  frames,  will  remain  constant. 

When  dealing  with  a  sequence  of  images,  as  with  any  se¬ 
quence  of  observations,  there  are  two  possible  frameworks 
to  consider  for  parameter  estimation.  In  a  batch  framework 
[14],  observations  from  all  frames  are  used  simultaneously 


to  estimate  the  state.  In  a  recursive  framework  [3,  1,  17,  2] 
the  current  estimate,  based  on  previous  frames,  together 
with  new  observations  from  the  current  frame  yield  a  new 
state  estimate.  A  batch  framework  is  more  suitable  when 
all  the  information  is  available  ahead  of  time,  while  a  re¬ 
cursive  framework  is  more  suitable  for  real-time  systems. 

Algorithms  for  recovering  structure  and  motion  have 
many  practical  applications,  such  as  reverse  engineering, 
virtual  reality,  movie  special  effects,  computer  aided  de¬ 
sign,  image  compression,  etc.  Most  of  these  algorithms 
are  general  in  the  sense  that  they  assume  no  prior  knowl¬ 
edge  about  the  scene.  However,  in  practice,  the  scene  typi¬ 
cally  contains  structures  with  strong  geometric  regularities. 
In  particular,  lines  and  planes  occur  frequently  in  real  im¬ 
agery  and  in  very  particular  orientations  [19].  Planar  sur¬ 
faces  are  quite  common  in  both  indoor  and  outdoor  envi¬ 
ronments.  Planar  man-made  structures  such  as  table  tops, 
floors,  walls  as  well  as  buildings,  roads,  pavements,  and 
playgrounds  occur  frequently  in  real  image  sequences. 


1.1  New  Approach 

The  goal  in  this  paper  is  to  reformulate  the  general  recur¬ 
sive  framework  for  pointwise  structure  recovery,  in  such  a 
way  that  it  takes  into  account  a  planarity  constraint.  This 
reformulation  results  in  a  smaller  state  vector,  and  conse¬ 
quently  in  a  more  accurate  and  stable  system.  The  formu¬ 
lation  is  based  upon  the  extended  Kalman  filter  (EKF)  ap¬ 
proach  originally  proposed  in  [1],  Experiments  with  syn¬ 
thetic  data  and  real  imagery  will  be  used  to  illustrate  the 
system  performance.  The  experiments  confirm  that  the 
specialized  formulation  provides  improved  accuracy,  sta¬ 
bility  to  observation  noise,  and  rate  of  convergence  in  esti¬ 
mation  for  the  case  where  the  tracked  points  lie  on  a  plane. 

Although  two  views  analyses  of  planar  structure  were 
carried  out  in  [10,  25,  6],  these  algorithms  are  known  to 
be  sensitive  to  measurement  noise.  More  recently,  Szeliski 
and  Torr  [19]  showed  how  the  quality  of  reconstruction  can 
be  improved  via  use  of  planarity  constraints.  In  addition, 
Dellaert,  et  al.  [4]  demonstrated  planar  structure  recovery 
through  the  inclusion  of  texture  maps  in  an  EKF  measure¬ 
ment  model;  however,  this  approach  assumed  that  the  plane 
in  the  initial  frame  was  front-facing.  The  approach  pro¬ 
posed  in  this  paper  models  planar  structure  explicitly  and 
does  not  make  any  assumption  about  the  initial  orientation 
of  the  plane  with  respect  to  the  camera. 
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2  Background 

In  this  section  we  give  a  brief  overview  of  a  recursive  esti¬ 
mation  framework  (EKF)  to  recover  motion,  structure  and 
focal  length  from  a  sequence  of  images,  given  correspon¬ 
dences  of  feature  points  between  frames.  The  formulation 
is  due  to  Azarbayejani  and  Pentland  [1],  and  will  serve  as 
the  basis  our  new  formulation  for  planar  structure  recovery. 

In  this  formulation,  state  vector  of  the  EKF  consists  of 
7  +  IV  parameters,  three  for  translation,  three  for  incremen¬ 
tal  rotation,  one  for  camera  focal  length,  and  N  for  depth 
of  the  feature  points.  The  state  vector  is  written  as  follows: 

x  =  ( tx.,ty ,  tzd,  lux,uy,  uz,  T-  (i| .  •  •  • .  a  at)  (1) 

where  j3  is  the  inverse  focal  length,  (tx ,  tv,  tz[3)  is  the  rel¬ 
ative  translational  motion,  and  (w^uy,  coz)  describes  the 
incremental  rotational  motion.  Finally,  point-wise  struc¬ 
ture  is  given  by  the  subvector  (ai ,  •  •  •  ,ajv),  where  a:,;  is 
the  depth  associated  with  the  ith  feature  point.  Depth  is 
expressed  relative  to  the  coordinate  system  of  the  camera 
in  the  first  frame.  Note  that  depth  can  be  recovered  up  to 
a  scale  factor  only;  therefore,  it  is  customary  to  fix  one  of 
the  a  -,  for  purposes  of  gaining  a  solution  [1]. 

It  should  be  noted  that  rotation  is  formulated  in  terms  of 
the  incremental  rotation  quaternion: 

Sq  =  (y/r=l,uxj2,u>Y/2,Wz/2)  (2) 

e  =  (ujx2  +  ujy2  +  u>z2)/4:.  (3) 

(',From  the  unit  incremental  rotation  quaternion  <5q,  the 
global  rotation  matrix  R  can  be  computed  as  described  in 
[1].  This  gives  the  relative  rotation  between  the  object  ref¬ 
erence  frame  and  the  current  camera  reference  frame. 

The  measurement  vector  of  the  EKF  is  given  by 

Z  =  (</|.,Pj:.  «2:  l'-2.  ■  ■  ■  ■.  U\  .  V\  I  (4) 

where  (ui ,  Vj)  is  the  image  location  of  the  ith  feature  point, 
and  i  =  1..N  where  N  is  the  number  of  features.  Thus,  the 
measurement  vector  of  the  EKF  consists  of  2 N  parameters. 

The  following  equations  capture  the  geometry  of  the 
problem  [1],  The  first  equation,  relates  the  3-D  location 
of  a  single  point  (X,  Y,  Z )  to  its  2-D  image  location  in  the 
first  frame  (it1 ,  ii1): 


where  a  is  the  depth  (or  structure),  and  ;1  =  1//  is  the 
inverse  focal  length. 

The  coordinate  frame  transformation  between  the  first 
frame  and  the  current  frame  is  formulated  as: 


(6) 


where  R  is  the  rotation  matrix  and  (tx  ,ty  ,tz  3)T  is  the 
translation,  as  described  above. 

Finally,  the  camera  model  for  central  projection  is  for¬ 
mulated  as  follows: 


Xc  \  1 

ib  )  \  ■  z,  r 


(7) 


where  the  coordinate  system  origin  is  fixed  at  the  image 
plane,  rather  than  at  the  center  of  projection. 

The  measurement  equation  for  the  EKF  is  obtained  by 
combining  the  Eqs.  5,  6,  and  7.  For  more  details  see  [1]. 

A  known  weakness  of  this  formulation  is  that  it  assumes 
that  the  image  coordinates  at  the  first  frame  are  correct,  an 
assumption  that  in  theory  biases  the  results,  but  has  very 
small  effect  in  practice.  A  possible  remedy  to  this  problem 
can  be  found  in  [2]. 

The  computational  complexity  of  the  algorithm  is  cubic 
with  respect  to  the  number  of  points.  In  other  methods 
[2]  a  seperate  filter  is  run  for  every  3D  point  resulting  in 
a  linear  algorithm.  However,  the  motion  and  focal  length 
are  not  modeled  explicitly  as  part  of  the  state,  but  rather 
are  implicit  in  the  projection  matrix  which  is  estimated  in 
a  separate  stage. 


3  Planar  Structure  Recovery 

Given  the  above  formulation,  we  can  now  add  a  constraint 
for  all  points  to  lie  on  a  single  plane.  The  measurement 
vector  of  the  EKF  remains  the  same,  while  the  new  state 
vector  becomes: 


x  =  (tx,tY,izl3,uxiu>y,uz,l3,  Nx,  Ny,Nz)-  (8) 

The  state  now  consists  of  only  7  +  3  =  10  parameters, 
the  first  seven  parameters  are  as  in  the  original  formula¬ 
tion,  and  the  other  three  represent  the  plane  parameters. 
Of  course,  a  plane  is  completely  determined  by  three  non- 
collinear  points,  or  equivalently  by  its  unit  normal  (two 
DOFs)  and  its  distance  from  the  origin  (one  DOF). 

Note  that  the  dimension  of  the  new  state  vector  is  inde¬ 
pendent  of  the  number  of  features  points.  We  expect  that 
as  the  number  of  feature  points  grows  larger,  our  estimator 
will  out-perform  the  previous  estimator  since  the  latter's 
dimension  grows  with  the  number  of  feature  points. 

If  the  points  lie  on  a  plane  then  they  satisfy  the  plane 
equation: 

N  •  X  =  1  (9) 

where  X  =  (X,  Y,  Z)  is  the  3-D  point  location  and  N  = 
(Nx,  Ny .  Nz )  is  the  plane  (non-unit)  normal,  and  the  dis¬ 
tance  of  the  plane  from  the  origin  is  given  by  d  =  \  \  N\  \ _1 . 
Rearranging  9  we  get: 

Z=1-NXX-NYV  m 

IMz 
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Since  a  =  Z.  we  can  rewrite  Eq.  5  using  Eq.  10  : 


and  rotation),  the  state  vector  of  Eq.  8  becomes: 


Rearranging 


1  -  N x  X  -  NyY 

+  N~z 


ul/3 

v'p 

1 


1 1  to  solve  for  ( X ,  Y,  Z)  we  get: 


/  X  \  (  (NZ  +  flu1  \ 

\  Y  =  I  (NZ  +  3V  ,  (12) 

\  Z  J  \  1  -  Nxu1  -  Nyv 1  J 


where  r]  =  1  /(Nxu1 3  +  Nyv1 3  +  Nz). 

As  in  the  original  formulation,  the  dynamic  model  in  the 
EKF  can  be  chosen  trivially  as  an  identity  transform  plus 
noise.  The  measurement  model  is  given  by  Eqs.  12,  6, 
and  7.  For  derivation  of  the  measurement  Jacobian,  see 
Appendix  5.  Note  that  depth  can  be  recovered  up  to  a 
scale  factor  only;  therefore,  it  is  necessary  to  keep  one  of 
the  Nj  fixed  for  purposes  of  gaining  a  solution. 


3.1  Relation  to  Two  Frames  Analysis 

In  this  section,  we  discuss  the  degree  of  determinacy  of  our 
system  and  compare  it  with  the  system  presented  in  [1], 
In  general,  the  parameters  of  a  system  can  be  recovered 
when  the  number  of  constraints  outnumbers  the  degrees  of 
freedom,  or  equivalently,  the  number  of  parameters  in  the 
system.  The  number  of  constraints  in  the  original  point- 
wise  structure  formulation  is  (1  +  2 N):  one  for  scale  and 
2 N  for  number  of  measurements  (u  and  v  coordinates  of 
A  feature  points).  The  number  of  degrees  of  freedom  is 
(6  +  1  +  Ar)  for  motion,  camera  and  structure  at  every 
frame.  So,  whenever  (1  +  2 N)  >  (1  +  6  +  Ar)  or  N  >  7, 
the  motion,  structure  and  camera  can  be  recovered. 

In  our  formulation,  the  number  of  constraints  remains 
the  same,  while  the  number  of  parameters  to  recover  is  re¬ 
duced  to  (6  +  1  +  3)  =  10  for  camera  motion  and  planar 
structure  at  every  frame.  Note  that  this  number  is  fixed  and 
does  not  depend  on  the  number  of  feature  points  Ar.  So, 
whenever  1  +  2AT  >  10  or  N  >  5,  the  motion,  camera 
and  planar  structure  can  be  recovered.  If  we  work  with  a 
normalized  camera  model,  that  is,  /  =  1  (3  =  1),  then  the 
number  of  constraints  increases  to  2 N  +  2  and  we  can  re¬ 
cover  motion  and  planar  structure  whenever  N  >  4.  This 
result  coincides  with  the  “four  corner”  model  [7,  25]  or 
the  “eight  parameter”  model  [18]  for  esimating  motion  and 
planar  structure  from  two  perspective  views. 


3.2  Motion  of  Multiple  Planes 

It  is  possible  to  extend  the  formulation  for  estimation  of  a 
single  plane  to  the  case  of  multiple  planes.  Assume  that 
there  are  to  planes.  Also  assume  that  each  image  feature  is 
assigned  to  its  corresponding  plane.  In  the  case  where  the 
environment  is  rigid  (  all  planes  undergoe  same  translation 


X\,N\-,  N\ ,  •  •  •  N™,  N?,  N™),  (13) 

where  the  superscripts  denote  the  ith  plane  parameters. 
The  state  now  consists  of  7  +  3 to  parameters,  where  the 
first  seven  parameters  are  as  in  the  original  formulation 
and  the  others  represent  the  three  parameters  for  each  of 
the  to  planes.  In  this  case,  the  translation,  rotation,  and  fo¬ 
cal  length  are  the  same  for  all  planes;  however,  it  is  also 
possible  to  formulate  the  state  vector  such  that  each  plane 
has  its  own,  independent  motion  and  translation. 


4  Experiments 

We  now  present  two  experiments  for  comparing  our  for¬ 
mulation  with  the  original  one.  Planar  structure  for  the 
original  formulation  is  obtained  by  fitting  a  plane  to  the 
recovered  3D  points. 

In  the  first  experiment  we  evaluate  system  performance 
with  different  noise  levels,  in  a  synthetic  data  setup  simi¬ 
lar  to  that  used  by  [1].  The  test  sequence  consists  of  100 
frames.  The  camera  motion  is  predetermined  for  the  entire 
sequence,  and  corresponds  to  a  rotation  about  the  y-axis  lo¬ 
cated  at  (0, 0,  2).  The  true  focal  length  is  set  to  one,  /3  =  1 
and  the  true  structure  consists  of  some  30  points  uniformly 
scattered  on  a  plane,  with  a  fixed  (non-unit)  normal  direc¬ 
tion  (Nx,  Ny,Nz)  =  2(^=,  0, 1).  The  initial  motion  pa¬ 
rameter  estimates  are  set  to  zero,  with  variances  zero.  The 
initial  inverse  focal  length  is  set  to  0.5,  with  variance  0.1. 
The  components  of  the  plane  normal  direction  are  set  to 
(1.5,  0.5,  2)  with  variances  (0.1,  0.1,  0).  In  each  trial,  uni¬ 
form  noise  with  varying  standard  deviation  is  added  to  both 
x  and  y  image  coordinates.  The  standard  deviation  corre¬ 
sponds  to  2,  6,  and  10  pixels,  based  on  an  image  size  of 
(512,512). 

Fig.  1  illustates  recovery  of  planar  structure,  camera  mo¬ 
tion,  and  focal  length  for  the  three  noise  levels.  Multiple 
trials  (twenty  at  each  noise  level)  were  conducted,  and  the 
average  estimates  were  plotted  on  the  graphs.  Parameters 
t  x  ,  qo,  Yx ,  and  3  are  represented  by  solid  lines  on  the 
graphs;  tz ft,  Ny  q-2  are  represented  by  broken  lines.  To 
avoid  clutter  in  the  graphs  we  do  not  show  ty,  q\,  q-2,  and 
Nz. 

Table  1  shows  a  summary  of  statistics  for  the  experi¬ 
ment.  As  can  be  seen  in  both  the  graphs  and  the  table, 
the  new  formulation  converges  faster  to  the  planar  structure 
and  camera  parameter  estimates.  In  addition,  the  new  for¬ 
mulation  tends  to  be  more  stable  as  noise  levels  increase; 
the  mean  error  and  variance  in  estimating  the  plane  normal 
and  the  camera  parameters  are  both  smaller  when  the  new 
formulation  is  employed. 
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Uniform  noise  ±6  pixels. 

Method  I  Translation  (tx,  ty,  tz/3)  I  Rotation  (q0 ,qi,q2,  qs)  I  Structure  (Wx ,  Ny ,  Nz)  I  Camera  C/3) 


Uniform  noise  ±10  pixels. 

Method  I  Translation  (tx,  ty,  tz/3)  I  Rotation  (q0 ,qi,q2,  qaj  I  Structure  (Wx ,  Ny ,  Nz)  I  Camera  (f3) 


Figure  1 :  Experiment  using  synthetic  data  with  random  noise  added.  Accuracy  and  convergence  of  the  new  vs.  old  formulations  was 
measured  against  ground  truth,  as  described  in  the  text.  Multiple  trials  (twenty  at  each  noise  levels)  were  conducted,  and  the  average 
estimates  shown  on  the  graphs.  Each  graph's  x-axis  is  the  frame  number  and  the  y-axis  is  the  state  variable.  Parameters  tx,  qo,  Nx,  and 
jj  are  represented  by  solid  lines  on  the  graphs;  tz/3,  A V  <72  are  represented  by  broken  lines.  To  avoid  clutter  in  the  graphs  we  do  not  show 
ty,  q\,qi,  and  Nz.  For  a  summary  of  statistics,  see  Table  1. 


4 


Performance  Statistics:  New  Formulation 


Noise 

Motion  Estimation  Error 

|  Convergence  | 

pixels 

mt 

C Tt 

mq 

<Tq 

rs 

rc 

2 

-0.0012 

0.0129 

-0.0037 

0.0298 

i 

9 

6 

-0.0018 

0.0304 

0.0009 

0.0584 

i 

14 

10 

-0.0033 

0.0634 

-0.0058 

0.3352 

7 
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Performance  Statistics:  Old  Formulation 


Noise 

Motion  Estimation  Error 

|  Convergence  | 

pixels 

mt 

(Tt 

mq 

aq 

rs 

rc 

2 

0.0002 

0.0325 

-0.0638 

0.3195 

12 

25 

6 

0.0012 

0.0790 

-0.0810 

0.3274 

14 
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Table  1 :  Average  performance  statistics  for  synthetic  data  exper¬ 
iments  with  increasing  noise  level.  Experiments  were  conducted 
in  trials  with  varying  uniform  noise  (standard  deviation  2,  6,  and 
10  pixels).  Mean  error  and  root  mean  squared  error  are  shown  for 
the  recovered  camera  motion  parameters  (translation,  rotation). 
For  the  static  parameters  (structure  and  camera)  the  table  provides 
the  frame  number  for  which  the  camera  parameters  converge  to 
within  5%  of  the  true  value,  and  frame  number  for  which  the  nor¬ 
mal  converges  to  within  0.5°  of  its  true  value. 


Figure  2:  Example  frames  from  the  real  image  sequence. 
Tracked  features  are  shown  in  white. 

4.1  Experiments  on  Real  Imagery:  Box  sequence 

In  this  section,  we  describe  experiments  with  the 
BOX  sequence  available  from  the  UMass  database  [5], 
Fig.  2  shows  example  frames  from  the  sequence.  Cor¬ 
ner  features  were  tracked  using  an  implementation  of 
the  Kanade-Lucas-Tomasi  feature  tracker  available  from 
http://vision.stanford.edu/birch/klt/.  The  corner  features 
on  the  front  face  of  the  box  were  tracked  and  used  as  mea¬ 
surement  input  to  both  the  new  and  old  EKF  formulations. 
As  in  the  previous  experiments,  planar  structure  for  the 
original  formulation  was  obtained  by  fitting  a  plane  to  the 
recovered  3D  points. 

Graphs  showing  the  estimated  translation,  rotation, 
structure,  and  inverse  focal  length  for  both  formulations 
are  shown  in  Fig.  3.  The  ground  truth  for  each  translation 
and  rotation  parameter  for  this  sequence  lies  approximately 
along  a  line  [1],  As  can  be  seen  in  the  graphs  in  Fig.  3, 
the  estimates  of  camera  motion  obtained  by  the  new  EKF 
formulation  tend  to  converge  faster.  A  more  pronounced 
difference  in  convergence  can  be  seen  in  the  estimated  of 
the  planar  structure.  At  the  time  of  this  writing,  the  au¬ 
thors  have  been  unable  to  obtain  the  “ground  truth”  for  the 
Box  sequence.  Quantitative  RMS  error  comparisons  will 
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Figure  3:  Graphs  showing  the  estimated  translation,  rotation, 
structure,  and  inverse  focal  length  estimated  in  the  UMass  box 
sequence.  Features  belonging  to  the  front  face  of  the  box  were 
used  as  measurement  input  to  the  new  and  old  EKF  formulation. 

be  reported  in  the  final  version  of  the  paper. 

5  Conclusion 

We  have  presented  a  specialization  of  the  Azarbayejani  and 
Pentland  feature -based  recursive  estimator,  to  the  case  of 
planar  structure.  We  have  shown  how  adding  this  geomet¬ 
ric  constraint  reduces  the  dimension  of  the  state  vector,  and 
consequently  yields  a  more  numerically  stable  estimator. 
Since  planar  surfaces  are  quite  common  in  man-made  en¬ 
vironments,  this  new  formulation  should  prove  valuable.  It 
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is  likely  that  this  idea  of  adding  a  geometric  constraint  can 
be  carried  over  to  other  surfaces  which  have  the  analytical 
form  z  =  fix.  y).  In  future  work  we  plan  to  extend  this 
approach  to  the  case  of  quadric  surfaces,  which  also  occur 
frequently  in  many  real  image  sequences. 
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Appendix:  Measurement  Jacobian 

Using  the  notation  of  [24],  the  measurement  equation  of 

the  EKF  is  given  by 

zk=h(xk,vk) 

where  z  is  the  measurement  vector,  x  is  the  state  vector, 

v  is  the  measurement  noise  and  superscript  k  is  the  frame 
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number.  The  measurement  function  h  is  given  by  combin¬ 
ing  Eqs.  6,7,  and  12.  The  measurement  Jacobian  is  given 
by 


dhi  rk 

-0)- 


The  partials  with  respect  to  translation  are  as  follows: 
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The  partials  with  respect  to  incremental  rotation  are  as  fol¬ 
lows: 
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The  partials  of  the  incremental  rotation  matrix  with  respect 


to  u>j  are  as  follows: 
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Here,  7  =  \/l  —  e  and  e  is  given  in  Eq.  3.  R%q  is  the  in¬ 


terframe  rotation  matrix  and  Rq  is  the  global  rotation  ma¬ 
trix,  that  is  the  rotation  of  the  camera  from  the  first  frame 
to  the  predecessor  of  the  current  frame.  Rq  is  updated  ev¬ 
ery  frame  by  multiplying  the  interframe  and  global  rotation 
matrices.  R^+1  =  RgqRq. 

The  partials  with  respect  to  d  are: 


where 
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and  where  p  =  1  /(Nxud  +  Nyvd  +  Nz)- 

The  partials  with  respect  to  the  plane  parameters  are: 
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where  j  =  (X .  Y.  Z),  and 


The  partials  of  3-D  position  with  respect  to  the  plane  pa¬ 
rameters  are: 
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